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DIGITAL RECORDING AND PLAYBACK SYSTEM 
WITH VOICE RECOGNITION CAPABILITY 
FOR CONCURRENT TEXT GENERATION 

5 

RELATED U.S. APPLICATIONS 

This application is a continuation of and claims priority to U.S. Patent No. 
6,754,619 entitled "Digital Recording And Playback System With Voice 
Recognition Capability For Concurrent Text Generation," by Nakatsuyama filed 
10 on 1 1/1 5/99, which is incorporated herein by reference. 



BACKGROUND OF THE INVENTION 
Field of the Invention 
15 The present invention relates to the design of digital recording and 

playback systems. More specifically, the present invention pertains to the 
processing of voice and concurrent generation of corresponding text in a 
portable digital appliance. 



20 Related Art 

The use of portable digital recording and playback devices are quickly 
gaining popularity in business and among individual users. In particular, one 
attractive feature of digital recording is the possibility of converting the voice 
messages into text, which can then be reviewed, revised and incorporated into 

25 documents or otherwise retrieved for use subsequently. Today, there are 
several models of portable digital recorder in the marketplace. These prior art 
recorders typically record voice messages as compressed digital data. In order 
to convert the compressed digital data to text data, a separate computer 
program is generally required. Thus, in the prior art, subsequent to a recording 
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session, the user needs to post-process the compressed digital data to perform 
the voice-to-text conversion. This requires additional processing time, and in 
some cases even requires the user to transfer the compressed digital data from 
the portable device to a personal computer (PC) having the necessary software 
5 program before the conversion can be performed. It is desirable to eliminate the 
extra step of post-recording conversion from compressed digital data to text data 
in a portable digital recording and playback system. 

These prior art devices are not well-suited for generating text data from 
10 the recorded voice data for an additional reason. In order to achieve good 
conversion from voice to text, a high quality voice input to the voice to text 
conversion engine is needed. In prior art portable systems, the voice data is 
subject to high compression because portable systems typically have limited 
memory capacity, and high compression allows more voice data to be stored into 
15 the limited memory resources. Since voice data is stored in a highly compressed 
format in these portable prior art devices, the text data generated directly from 
the compressed voice data by a conversion program is usually unsatisfactory. 
As such, it is highly advantageous to have a portable digital recording and 
playback system which provides high quality conversion from voice to text. 

20 

Furthermore, portable devices are typically battery-powered. Thus, the 
need to conserve power is a major design consideration. As such, while a high 
capacity stager can potentially be used in a large, non-portable device deriving 
its power from a power outlet to improve the quality of the conversion from 
25 compressed voice data to text data, it is not a viable option in a portable device. 
Therefore, there exists a need for a portable digital recording and playback 
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system which provides high quality conversion from voice to text and yet does 
not require a high rate of power consumption. 
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SUMMARY OF THE INVENTION 

In implementing a viable portable digital recording and playback system, it 
is highly desirable that components that are well known in the art and are 
compatible with existing computer systems and other appliances be used so that 
5 the cost of realizing the portable digital recording and playback system is low. 
By so doing, the need to incur costly expenditures for retrofitting existing 
computer systems and other appliances or for building custom components is 
advantageously eliminated. 

1 0 Thus, a need exists for a portable digital recording and playback system 

which does not require post-recording conversion to generate text data from 
compressed digital data. A further need exists for a portable digital recording 
and playback system which meets the above need and which provides high 
quality conversion from voice to text. Still another need exists for a portable 

15 digital recording and playback system which meets both of the above needs and 
which does not require a high level of power consumption. Yet another need 
exists for a portable digital recording and playback system which meets all of the 
above needs and which is conducive to use with existing computer systems and 
other appliances. 

20 

Accordingly, the present invention provides a portable digital recording 
and playback system which generates text data from voice without requiring 
post-recording conversion from compressed digital data to text data. The 
present invention further provides a portable digital recording and playback 
25 system which not only provides voice to text conversion without post-processing 
but the conversion is also of high quality. Embodiments of the present invention 
perform voice-to-text conversion using the high quality audio input signal rather 



SONY-50N3172 



Confidential 



Clean-Version Substitute Specification (Without Claims) 

-5- 

than highly compressed voice data so that high quality conversion is achieved. 
Moreover, the present invention provides a portable digital recording and 
playback system which includes the above features and which conserves power 
for full battery operation. Furthermore, embodiments of the present invention 
5 utilize components that are well known in the art and are compatible with existing 
computer systems and other appliances, so that the present invention is 
conducive for use with existing computer systems and other appliances. These 
and other advantages of the present invention not specifically mentioned above 
will become clear within discussions of the present invention presented herein. 

10 

More specifically, in one embodiment of the present invention, a digital 
recording and playback system is provided. In this embodiment, the system 
comprises an audio capturing device configured to receive a voice input. The 
system also comprises a high compression encoder (HCE) coupled to the audio 

15 capturing device and configured to generate digital wave data corresponding to 
the voice input. The system further comprises a voice recognition engine (VRE) 
coupled to the audio capturing device and configured to generate text data 
corresponding to the voice input. Moreover, in this embodiment, the HCE and 
VRE are selectively coupled to a memory sub-system which is configured to 

20 store the digital wave data and the text data. In particular, in this embodiment, 
the HCE and the VRE are operable to concurrently generate the digital wave 
data and the text data in response to the voice input such that the digital wave 
data and the text data can be stored in the memory sub-system in a 
synchronized manner. Thus, in this embodiment, the present invention provides 

25 recording capability wherein text data is generated from a voice input without 
requiring post-recording conversion. In a specific embodiment, the present 
invention includes the above and wherein the system is battery-powered. 
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Additional embodiments of the present invention include the above and 
further comprise a decoder selectively coupled to the memory sub-system and 
configured to decode the digital wave data into decoded audio data, a digital-to- 
5 analog (D/A) converter coupled to the decoder and configured to convert the 
decoded audio data into an analog signal, and an audio output device coupled to 
the D/A converter and configured to generate a voice output corresponding to 
the voice input from the analog signal. Moreover, these embodiments also 
comprises a display sub-system selectively coupled to the memory sub-system 
10 and configured to display the text data. Thus, in these embodiments, the 
present invention provides simultaneous voice playback and text display. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a part of 
this specification, illustrate embodiments of the invention and, together with the 
description, serve to explain the principles of the invention: 

Figure 1 is a block diagram illustrating a portable digital recording and 
playback system 100 in accordance with one embodiment of the present 
invention, wherein the system has built-in voice recognition capability for 
concurrent text generation during voice recording. 

Figure 2A is a flow diagram illustrating steps for performing recording 
using system 100 of Figure 1 in accordance with one embodiment of the present 
invention. 

Figure 2B is a diagram illustrating one embodiment of arrangement of 
corresponding portions of voice data and text data as stored in a portable digital 
recording and playback system 100 in accordance with the present invention. 

Figure 3 is a flow diagram illustrating steps for performing playback using 
system 100 of Figure 1 in accordance with one embodiment of the present 
invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

In the following detailed description of the present invention, a digital 
recording and playback system with voice recognition capability for concurrent 
text generation, numerous specific details are set forth in order to provide a 
5 thorough understanding of the present invention. However, it will be recognized 
by one skilled in the art that the present invention may be practiced without these 
specific details or with equivalents thereof. In other instances, well known 
methods, procedures, components, and circuits have not been described in 
detail as not to unnecessarily obscure aspects of the present invention. 

10 

Exemplary Configuration of a Digital Recording and Playback System 

of the Present Invention 
Figure 1 is a block diagram illustrating a portable digital recording and 
playback system 100 in accordance with one embodiment of the present 

15 invention, wherein the system has built-in voice recognition capability for 
concurrent text generation during voice recording. In system 100, an audio 
capturing device 1 10 is coupled to a high compression encoder (HCE) 120. 
Audio capturing device 1 10 is also coupled to a voice recognition engine (VRE) 
130. Both HCE 120 and VRE 130 are selectively coupled to a memory sub- 

20 system 140 through an intelligent switch 135. More particularly, switch 135 is 
operable to couple either HCE 120 or VRE 130, but not both, to memory sub- 
system 140 at any given time. In one embodiment, switch 135 is a multiplexer. 
In another embodiment, switch 135 is a software switch for data routing. In an 
exemplary embodiment, audio capturing device 110 comprises a microphone. It 

25 is appreciated that audio signals are supplied to HCE 120 and VRE 130 
simultaneously so that voice encoding and recognition functions can be 
performed in parallel. 
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It is appreciated that within the scope of the present invention, memory 
sub-system 140 can comprise volatile memory (e.g., random access memory 
RAM), non-volatile memory (e.g., read only memory ROM), and/or data storage 
5 devices such as magnetic or optical disk drives and disks (e.g., diskettes, tapes, 
cartridges) which are computer readable media for storing information and 
instructions. These memory modules of memory sub-system 140 can be 
removable to facilitate the easy transfer of data stored therein. In one 
embodiment, memory sub-system 140 comprises semiconductor flash memory. 

10 

Still referring to Figure 1, memory sub-system 140 is selectively coupled 
to both a decoder 150 and a display sub-system 180 through an intelligent 
switch 145. More particularly, switch 145 is operable to couple memory sub- 
system 140 to either decoder 150 or display sub-system 180, but not both, at 
15 any given time. In one embodiment, switch 145 is a multiplexer. In another 
embodiment, switch 145 is a software switch. In one embodiment, switch 145 is 
controlled by the texted voice data generated by VRE 1 30. Moreover, in an 
exemplary embodiment, display sub-system 180 comprises flat panel display 
technology, for example, a liquid crystal display (LCD). 

20 

With reference still to Figure 1 , decoder 150 is further coupled to a digital- 
to-analog (D/A) converter 160. Moreover, D/A converter 160 is coupled to an 
amplifier 165, which is in turn coupled to an audio output device 170. In one 
embodiment, audio output device 170 comprises a speaker. 

25 

With reference still to Figure 1 , in one embodiment, an editing sub-system 
190 is coupled to memory sub-system 140. In this embodiment, editing sub- 
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system 190 can include an alphanumeric input device having alphanumeric and 
function keys to allow user editing of the text data. Editing sub-system 190 can 
also include a cursor control or directing device to facilitate text editing and 
command selection by a user. Cursor control device allows the computer user to 
5 dynamically signal the two dimensional movement of a visible symbol (cursor) on 
a screen of display sub-system 180. Many implementations of cursor control 
device are known in the art including a trackball, mouse, touch pad, joystick or 
special keys on the alphanumeric input device capable of signaling movement of 
a given direction or manner of displacement. Alternatively, it will be appreciated 
1 0 that a cursor can be directed and/or activated via input from the alphanumeric 
input device using special keys and key sequence commands. The present 
invention is also well suited to directing a cursor by other means such as, for 
example, voice commands. Moreover, editing sub-system 190 can further 
include a printing device for generating paper copies of the text data. 

15 

Operation of a Digital Recording and Playback System of the Present Invention 

Referring next to Figure 2A, a flow diagram 200 illustrating steps for 
performing recording using system 100 of Figure 1 in accordance with one 
embodiment of the present invention is shown. In step 210, system 100 receives 
20 a voice input using audio capturing device 1 1 0. 

In step 220, system 100 of Figure 1 generates digital wave data from the 
voice input using HCE 120. In an exemplary embodiment, HCE 120 of system 
100 can achieve a compression rate of two kilobits per second (2 kbit/s). It is 
25 appreciated that the high level of compression of the digital wave data in 

accordance with the present invention advantageously reduces the amount of 
memory that is required to store the digital wave data. 
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In step 230, system 100 of Figure 1 generates text data from the voice 
input using VRE 130. In one embodiment, VRE 130 of system 100 uses Hidden 
Markov Model (HMM) techniques to perform voice recognition, although other 
5 voice recognition techniques can also be used within the scope of the present 
invention. It is also appreciated that the text data can be in any of a wide variety 
of formats. In an exemplary embodiment, the text data is generated in hypertext 
markup language (HTML) format. 



10 Referring still to Figure 2A, in step 240, system 100 of Figure 1 stores the 

digital wave data and the text data as mixed data in memory sub-system 140 in a 
synchronized manner. More specifically, in one embodiment, steps 220 and 230 
are performed concurrently and the digital wave data and the text data 
generated is sent to memory sub-system 140 via switch 135 in alternate fashion 

1 5 such that a particular portion of the digital wave data is correlated with the 

corresponding portion of the text data as they are being stored as mixed data. In 
an exemplary embodiment, the present invention employs a buffering 
mechanism in conjunction with switch 135 to handle timing delays that may arise 
during the voice recognition process (e.g., digital wave data is generated more 

20 quickly by HCE 1 20 than the corresponding text data is generated by VRE 1 30) 
to ensure that corresponding portions of digital wave data and text data is 
synchronized when it is stored in memory sub-system 140. 



Referring next to Figure 2B, a diagram illustrating one embodiment of 
25 arrangement of corresponding portions of voice data and text data as stored in a 
portable digital recording and playback system 100 in accordance with the 
present invention is shown. In an exemplary embodiment as shown in Figure 
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2B, a voice input is converted into portions 261 , 262 and 263 of digital wave data 
and corresponding portions 271 , 272 and 273 of text data. These portions of 
digital wave data and text data are then stored in memory sub-system 1 40 as 
mixed data such that respective portions of digital wave data and text data are 
5 synchronized. More specifically, in one embodiment, the data portions are 
stored in alternate fashion such that a particular portion of the digital wave data 
is correlated with the corresponding portion of the text data (e.g., text data 
portion 261 with digital wave data portion 271; text data portion 262 with digital 
wave data portion 272; text data portion 263 with digital wave data portion 273.) 

10 

As such, the present invention enables subsequent access and retrieval 
of the stored data to be performed efficiently and conveniently because the text 
data can be used to search for a desired portion of digital wave data, and vice 
versa, since the text data and digital wave data is synchronized. In one 
15 embodiment, switch 135 is controlled based on phonetic group definitions of the 
text in the text data. 

By performing real-time voice recognition on the voice input to generate 
text data, embodiments of the present invention eliminate the post-processing 

20 that is typically required in prior art systems in order to derive text data from 
stored voice data. Moreover, since the text data is generated directly from the 
voice input in the present invention and not from highly compressed voice data 
as in the prior art, high quality voice-to-text conversion is achieved. In addition, 
since the present invention does not rely on the stored voice data to generate 

25 the text data, the voice input can be subject to high compression and stored as 
digital wave data in accordance with the present invention to advantageously 
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reduce the amount of memory required for storage without compromising the 
quality of the text data. 

With reference next to Figure 3, a flow diagram 300 illustrating steps for 
performing playback using system 100 of Figure 1 in accordance with one 
embodiment of the present invention is shown. In step 310, system 1 00 of 
Figure 1 retrieves the mixed data which comprises digital wave data and text 
data from memory sub-system 140. 

In step 320, system 100 of Figure 1 decodes the digital wave data into 
audio data using decoder 150. In step 330, system 100 converts the audio data 
into an analog signal using D/A converter 160. In optional step 340, in one 
embodiment, system 100 amplifies the analog signal. In step 350, system 100 
generates a voice output corresponding to the voice input from the analog signal. 

It is appreciated that the present invention provides a high quality voice 
output. More specifically, the voice output is based on the recorded voice input 
(as digital wave data) and is a high fidelity reproduction thereof, and not based 
on a simulated voice generated using text data. 

With reference still to Figure 3, in step 360, system 100 of Figure 1 
displays the text data using display sub-system 180. More specifically, in one 
embodiment, the digital wave data and the text data retrieved is sent to decoder 
150 and display sub-system 180 via switch 145 in alternate fashion such that 
output of the digital wave data by audio output device 170 and display of the text 
data by display sub-system 180 is synchronized. As such, the present invention 
affords great convenience to the reviewer of the recorded voice and text. In one 
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embodiment, switch 145 is controlled based on phonetic group definitions of the 
text in the text data. 



It is appreciated that embodiments of the present invention can operate 
5 for extended periods of time under battery power (e.g., disposable batteries, 
rechargeable batteries) because components of system 100 (Figure 1) in 
accordance with the present invention do not consume power at a high rate. 
Thus, the present invention provides a digital recording and playback system 
which is operable under battery power and is portable and wherein high quality 
10 text data is generated from a voice input without requiring post-recording 
conversion. 



Moreover, it is appreciated that system 100 of Figure 1 in accordance with 
embodiments of the present invention does not require specialized circuit 
15 components or extensive retrofitting of existing computer systems and other 
appliances, because the circuit elements required for its implementation are 
commonly used in today's electronic appliances and are fully compatible with 
existing computer systems and other appliances. As such, a portable, battery- 
powered digital recording and playback system which does not require post- 
20 processing to generate high quality text data, and which is conducive to use with 
existing computer systems and other appliances is provided by the present 
invention. 



It is further appreciated that although exemplary values and operational 
25 details (e.g., compression ratio of HCE 120, voice recognition techniques used in 
VRE 130) for various components are given with respect to embodiments of the 
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present invention described above, such values and details are illustrative only 
and can vary within the scope and spirit of the present invention. 

The preferred embodiment of the present invention, a digital recording 
5 and playback system with built-in voice recognition capability for concurrent text 
generation, is thus described. While the present invention has been described in 
particular embodiments, it should be appreciated that the present invention 
should not be construed as limited by such embodiments, but rather construed 
according to the below claims. 



SONY-50N3172 



Confidential 



